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Author: ExtremeCoders © 2014 E-mail: extremecoders@mail.com According to 
Wikipedia, "Dropboxisafile hostingserviceoperatedbyDropboxJnc .,headquarteredinSan 
Francisco, California, that offers cloud storage, file synchronization, personal cloud, and 
client software. Dropbox allows users to create a special folder on each of their 
computers, which Dropbox then synchronizes so that it appears to be the same folder 
(with the same contents) regardless of which computer is used to view it. Files placed in 
this folder also are accessible through a website and mobile phone applications" Dropbox 
provides client software for Microsoft Windows, Mac OS X, Linux, Android, iOS, 
BlackBerry OS and web browsers, as well as unofficial ports to Symbian, Windows 
Phone, and MeeGo. The Dropbox client software is written in python so that a single 
codebase can be deployed to a wide variety of platform and architectures. Another benefit 
of using python is ease in coding and reduced time for testing and deployment. However 
python poses other problem too such as the relative ease in reversing & decompiling as 
compared to native applications. In case of a closed source application like dropbox this 
is a serious issue, and something must be done to prevent users from getting access to the 
source code. In this regard, the dropbox client on Windows is shipped as an .exe file. The 
executable is generated using py2exe which serves two purposes - firstly, it becomes a lot 
easier for the end user to install than fiddling with a bunch of .pyc files and the second 
and the most important is it prevents over enthusiast users from peeking into the source. 
So good luck and bon voyage on this reversing journey. 

Page I 2 Introduction After installing Dropbox (which installs silently without any user 
intervention) we can navigate to the above folder, to find the main dropbox binary. Since 
we know it is already written in python and has been packaged by py2exe we will not 
waste any time by running through a PE detection tool like PEiD. Python code in the file 
system can reside in two forms. It may be either in plaintext .py or compiled .pyc form. 
In the latter case, the file is not directly readable by a text editor but can be run through a 
python bytecode decompiler in order to obtain the plaintext source code. In the case of 
dropbox, these .pyc files are packaged inside the executable. So our first step is to unpack 
the executable to get hold of the .pyc files. Unpacking the executable Now let's discuss a 
bit about py2exe and its innards. It is a tool which packages python scripts into a 
windows executable along with an embedded python interpreter (actually I am 
oversimplifying here). This executable can then be run on a windows system as a 
standalone file without the necessity of installing python. All the necessary scripts 
required for the software to run are packaged within it. Now the .pyc files necessary for 
the application to run are packaged as a zip archive. This archive is just concatenated at 
the end of the py2exe loader stub namely run_w.exe or run.exe for windows and console 
applications respectively. The python interpreter on the other hand is packaged as a 
resource. During runtime the executable fires up the embedded python interpreter. This 
embedded python interpreter is a .dll on windows. The dynamically linked library is 
loaded entirely from memory and since Windows does not allow a PE to be loaded from 



memory, the tool is provided with its own PE loader. We can look into the file 
Memory Module .c within py2exe source code for the details. 

Page I 3 So for unpacking we have to do two things - first, grab the zip archive containing 
the .pyc and then extract the python dll embedded as a resource within the executable. 
For the second objective, we can use any decent resource editor. Here I am using PE 
Explorer. Note that besides the resource PYTHON27.DLL, there is another one named 
PYTHONSCRIPT. This contains a set of start-up scripts (actually they are not scripts, 
since they are compiled and not directly readable by a text editor) which are run before 
the application is initialized. The purpose of them is to set up some import hook which 
facilitates to load the pyc from within the executable. Normally without them, python can 
only load .pyc files from the file system or from a normal zip archive. Since the pyc files 
are packaged in an archive concatenated to the PE, it needs special treatment i.e. import 
hooks to load them. After import hooks have been set up, whenever python wants to 
import a module, the import hooks are called which bypasses the regular import 
mechanism loading the pyc files from the executable. So in short it acts like a proxy. The 
advantage of this, we do not need to have files residing on the system in order to load 
them. They can be anywhere! Okay, now the first objective. For extracting the zip 
archive, we load the file into exeinfo pe and dump the overlay. If we want to automate 
some of the steps, we can use the tool py2exe dumper. 

Page I 4 Inspecting the pyc files Once we have extracted the embedded zip archive from 
the executable, we can see that it contains .pyc files. Opening any such file in a hex editor 
reveals that the file is encrypted. There are no readable strings at all. A normal pyc file 
generally has some readable strings which are missing in this case. Further pyc files begin 
with a 4 byte magic number followed by another 4 byte timestamp. In this case, it begins 
with the value 07 F3 0D OA, which is different from the normal python 2.7 magic value 
of 03 F3 0D OA. Obviously, we cannot decompile the file as is. We need to decrypt it; 
else no decompiler will show any interest in reading it. Figuring the decryptionprocess 
Right now we are in the dark about the encryption algorithm used and how to decrypt it. 
The dropbox application must load these modules, which means internally it must 
decrypt them before it can work. So if we can grab them from memory after it has 
decrypted itself, we can get the decryption for free. Let's see if this method is feasible. 
We will be coding a program in C which will embed the dropbox python interpreter 
(python27.dll) we obtained previously. The purpose of the program will be to run any 
python script in the context of dropbox. In order to compile and link we will use the 
header files and import libraries provided with the standard python 2.7 distribution. Let's 
name the output file as embedder.exe #include "Python.h" #include "marshaLh" #include 
<windows.h> void main(int argc, char *argv[]) //The script to run will be provided as an 
argument { if(argc < 2) 

Page I 5 { printf("No script specified to runn"); return; } HANDLE hand = 
CreateFileA(argv[ 1 ] , GENERIC_READ , FILE_SHARE_READ, NULL, 
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if(!hand) { printf( "Failed 
to open file %sn", argv[l]); return; } char pathname[MAX_PATH]; 
GetModuleFileNameA(NULL, pathname, sizeof(pathname)); char *c = strrchr(pathname, 
"); *c=0; Py_NoSiteFlag = 1; //We do not need to load site modules 



Py_SetPythonHome(pathname); //Setting the path to the python libraries 
Py_Initialize(); //Allocating a buffer to hold the file's contents void *buf = 
calloc(GetFileSize(hand, NULL) +1,1); //Reading the file within the buffer DWORD 
dummy; ReadFile(hand, buf, GetFileSize(hand, NULL), &dummy, NULL); 
CloseHandle(hand); PyGILState_STATE gstate; gstate = PyGILState_Ensure(); 
PyRun_SimpleString((char*) buf); //Running the python script located in the buffer 
PyGILState_Release(gstate); Py_Finalize(); free(buf); } Using the above program, we 
can run any python script in the context of the dropbox python interpreter. Now we will 
write a python script which will load those encrypted pyc files using the marshal module. 
Hopefully, that will decrypt it too. After that we will dump it back to disk, import marshal 
infile = open('authenticate.pyc', 'rb') # The encrypted pyc file infile.seek(8) # Skip the 
header, it consists of magic value & timestamp of 4 bytes each code_obj = 
marshal .load(infile) # Unmarshal the file outfile = open('decrypted.pyc', 'wb') # The 
output file Outfile.write('x03xf3x0dx0ax00x00x00x00') # Write the header 
marshal .dump(code_obj, outfile) # Dump back to file outfile .close() infile .close() 

Page I 6 So let's run the script using the C program embedding python. We need to pass 
the name of the script file as an argument. However the results are not encouraging. 
Indeed it generates a file decrypted.pyc but on examining its contents we see this. The file 
is basically empty. Dumping to disk has failed. Only it has written the hex byte 4E in 
addition to the header. We need to analyse the dropbox python interpreter to see why it 
has happened. PyPy to the rescue CPython provides a function 

PyMarshal_WriteObjectToFile, which dumps a code object to a file on disk. Internally, 
this calls another function w_object which does the actual work. In case of dropbox, 
w_object has been patched to disable marshalling of code objects to disk. So to get 
around the limitation we need an alternative. We need some code, preferably in python, 
which will do the marshalling for us. Luckily there is an implementation of python called 
PyPy which is written in python itself. We can leverage PyPy's marshalling code to dump 
the code objects in our case. The marshalling code is in the file _marshal.py which can be 
obtained from PyPy source. Let's save the file as dropdump.py. «< Code from 
_marshal.py should be copied here as is »> import marshal # Import the built-in marshal 
module infile = open('authenticate.pyc', 'rb') infile. seek(8) code_obj = marshal.load(infile) 
outfile = open('decrypted.pyc', 'wb') outfile .Write('x03xf3x0dx0ax00x00x00x00') 
dump(code_obj, outfile) # Use PyPy's marshalling code outfile.close() infile .close() 
However on running we are greeted with the following ungrateful message. 

Page I 7 The mystery of the missing co_code According to the python documentation 
"co_code is a string representing the sequence of bytecode instructions". Every python 
code-object has an array which contains bytecode which will be executed. This array is 
called co_code. This should always be present or otherwise code-objects cannot exist. 
That means the dropbox python interpreter is hiding that from us. As we cannot access 
co_code from the python layer we need to delve deeper and try to access that from within 
the native or the assembly layer. We need the services of a debugger. We will use old & 
faithful Ollydbg, but before that let's see the structure of a PyCodeObject. typedef struct 
{ PyObject_HEAD int co_argcount; int co_nlocals; int co_stacksize; int co_flags; 
PyObject *co_code; //This is the missing member PyObject *co_consts; PyObject 



*co_names; PyObject *co_varnames; PyObject *co_freevars; PyObject *co_cellvars; 
PyObject *co_filename; PyObject *co_name; int co_firstlineno; PyObject *co_lnotab; 
void *co_zombieframe; PyObject *co_weakreflist; } PyCodeObject; The first member in 
the structure is PyObject_HEAD. It is defined as /* PyObject_HEAD defines the initial 
segment of every PyObject. */ #define PyObject_HEAD _PyObject_HEAD_EXTRA 
Py_ssize_t ob_refcnt; struct _typeobject *ob_type; For release builds of 
python_PyObject_HEAD_EXTRA is empty. So essentially it contains two members of 4 
bytes each. The first member ob_refcnt holds the number of reference counts to this 
object & the second objype points to a PyTypeObject variable representing the type of 
this object. For example, if it points to PyCode_Type, then this object is a code_object. 
Just by looking at the second member of a PyObject in a debugger, we can know the type 
of it. This tiny bit of information will turn very useful in our endeavour. Thus in a 
standard python distribution co_code is located at an offset of 24 byte from the start of 
PyCodeObject. This co_code is a pointer to a PyObject. It's in fact a pointer to a 
PyStringObject (which is also a PyObject) as mentioned in the documentation. The 
structure of a PyStringObject is as follows, typedef struct { PyObject_VAR_HEAD long 
ob_shash; int ob_sstate; char ob_sval[l];//This array contains the string, its length is in 
PyObject_VAR_HEAD } PyStringObject; 

Page I 8 PyObject_VAR_HEAD is defined as #define PyObject_VAR_HEAD 
PyObject_HEAD Py_ssize_t ob_size; /* Number of items in variable part */ Within a 
PyStringObject, ob_sval is located at an offset of 20 bytes from the start. We will use 
these offsets while working in the debugger. These offset values are obtained from a 
standard python distribution. It is possible that dropbox has changed the structure layout 
so as to hinder reversing. We will see that shortly. Exploring the structure of code object 
To find whether the dropbox has kept the structure layout intact or has modified it, we 
will code a small C program. It will generate a code object using the PyCode_New 
CPython function. We will run the program in Oily dbg, and inspect the returned value. 
We will check whether the offsets are in tandem with what we obtained earlier. If they are 
different, we will need to find the actual offsets. #include "Python.h" #include 
"marshal.h" #include <windows.h> void main(int argc, char *argv[]) { Py_NoSiteFlag = 
1; char filename [MAX_PATH] ; GetModuleFileNameA(NULL, filename, 
sizeof (filename)); char *c = strrchr(filename, "); *c = 0; Py_SetPythonHome(filename); 
Py_Initialize(); PyObject *codestring = PyString_FromString( "Marker String"); //This 
marker string can be used to find out the position of co_code within the code object 
PyObject *tuple = PyTuple_New(0); PyObject *string = PyString_FromString(""); 
PyCodeObject *codeObject = PyCode_New(0, 0, 0, 0, codestring, tuple, tuple, tuple, 
tuple, tuple, string, string, 0, string); Py_Finalize(); } 

• Page I 9 So debug the program in Ollydbg and set a breakpoint after the call to 

PyCode_New. The returned value in eax will be the pointer to a PyCodeObject which we 
will be inspecting for anomalies. The returned value in this case is 0x0094E268. 
Wewillfollow the value in dump. The returned object is indeed a PyCodeObject as 
evident from the second member. Now we need to verify if the value at an offset of 24 is 
indeed a pointer to co_code. So we follow the value at 0x0094E280 in dump to reach 
here. Remember that co_code is actually a PyStringObject. Now this is a PyTupleObject. 



We were expecting to find a PyStringObject here. This means dropbox has fiddled with 
the layout of PyCodeObject as we suspected earlier. So we need to find the actual offset 
of co_code. From this juncture we can take two paths. We can either follow each member 
of PyCodeObject in dump to see which is a PyStringObject containing our marker string 
or we can write a small C program to do the job. During my reversing session, I took the 
first path, but now in this tutorial I will demonstrate the second one and then will verify 
the result in the debugger. We will be modifying the previous program a little to find out 
the offset of co_code. 

Page I 10 #include "Python.h" #include "marshaLh" #include <windows.h> void main(int 
argc, char *argv[]) { Py_NoSiteFlag = 1; char filename [MAX_PATH]; 
GetModuleFileNameA(NULL, filename, sizeof (filename)); char *c = strrchr( filename, "); 
*c = 0; Py_SetPythonHome(filename); Py_Initialize(); PyObject *codestring = 
PyString_FromString( "Marker String"); PyObject *tuple = PyTuple_New (0); PyObject 
*string = PyString_FromString(""); PyCodeObject *codeObject = PyCode_New(0, 0, 0, 
0, codestring, tuple, tuple, tuple, tuple, tuple, string, string, 0, string); char *ptr; for(ptr = 
(char*)codeObject; ptr < (char*)codeObject + sizeof(PyCodeObject); ptr+=4) 
if( * ((PyObject* *)ptr) == codeString) printf("co_code found at offset %dn", (ptr - 
(char*)codeObject)); Py_Finalize(); } Running the program gives the following output. 
To verify the results we will use the debugger. We will be following the value at an offset 
of 56 (i.e. 0x00953820) from the start of PyCodeObject. It's indeed a PyStringObject as it 
should be, and further if we change the display to hex we will see our marker string. So 
dropbox has modified the structure of PyCodeObject. Now co_code is located at an offset 
of 56 instead of 24. 

Page I 1 1 Getting access to co_code Okay, co_code is located at an offset of 56 but we 
cannot access it in the python layer. We need access to it so that we can marshal the code 
object to disk, and for this purpose we will code a C extension. The program will contain 
a function which when fed a PyCodeObject will return the co_code. #include <Python.h> 
static PyObject* getCode(PyObject* self, PyObject* args) { PyObject* code = NULL; 
PyObject* co_code = NULL; PyArg_ParseTuple(args, "O", &code); _asm { mov eax, 
code mov eax, dword ptr [eax + 56] //The code object is located at an offset of 56 mov 
co_code, eax } Py_XINCREF(co_code); //Increase the reference count return co_code; } 
static PyMethodDef extension_methods[]={ {"getCode", getCode, METH_VARARGS, 
"Get Code Object"}, {NULL, NULL, 0, NULL}}; PyMODINIT_FUNC 
initdropextension() { //The name of the extension module as seen from python //This 
should be of same name as of the extension file. //In this case the file will be named as 
dropextension.pyd Py_InitModule("dropextension", extension_methods); } We need to 
modify dropdump.py so that it uses our extension for accessing co_code. We will only 
modify the function dump_code. Rest will remain same, import dropextension # Import 
the extension def dump_code(self, x): self._write(TYPE_CODE) 
self .w_long(x .co_argcount) self .w_long(x .co_nlocals) self .w_long(x .co_stacksize) 
self.w_long(x.co_flags) self.dump(dropextension.getCode(x)) # Use our extension to 
access co_code self.dump(x.co_consts) self.dump(x.co_names) 
self.dump(x.co_varnames) self.dump(x.co_freevars) self.dump(x.co_cellvars) 



self .dump(x .co_filename) self .dump(x .co_name) self .w_long(x .co_firstlineno) 
self .dump(x .co_lnotab) 

Page I 12 Is that enough? Lets' now try to marshal the code using the newly coded tools. 
This time there are no error messages. We can find the newly created file decrypted .pyc. 
Let's open it in a hex editor. This time the file is not empty. If we scroll down a bit we can 
find some readable strings. This means the file has been decrypted and marshalling has 
succeeded. All that is left is to decompile the file. 

Page I 13 We will be using EasyPythonDecompiler.Let's try to decompile decrypted .pyc. 
Tough luck! Failure once again. This time it says Invalid pyc/pyo file. Although 
decompilers do fail, but in this case the file is really invalid. We will be using 
pychrysanthemum, a tool for inspecting pyc files to verify the results. From the summary 
tab everything looks to be normal. Let's have at a look at the Disassembly tab. 

Page I 14 It definitely shows some disassembled code but if we observe carefully, we will 
find that all this is junk code. We find many STOP_CODE.Howeverthe documentation of 
python says STOP_CODE "Indicatesend- of-code to the compiler, not used by the 
interpreter". So it means we should not find this opcode in a compiled python file. If we 
scroll down further in the disassembly, we will see other instances strongly suggesting 
that the code is junk. In addition to the STOP_CODE there are several other opcodes 
which it could not disassemble. That means our task is not over yet. We need to convert 
this junk code to something comprehendible to a disassembler & a decompiler. 

Page I 15 The use of opcode remapping Opcode remapping is a technique in which the 
opcode definition of a Virtual Machine is changed. In case of python it means that the 
opcodes meaning are different than that of a standard python distribution. Thus if the 
opcode 23 initially meant BINARY_ADD, it may now mean POP_TOP.To be able to 
decompile the file successfully we need to obtain the new opcode mapping. Using that we 
can definitely decompile the pyc file. In the case of dropbox we are confident that it uses 
this trick, as all other facets of the pyc file are perfectly normal. Only we cannot decipher 
the bytecode instructions. Now ponder for a moment. Suppose that we compile a python 
script in this modified python interpreter and compare it with the output generated by 
compiling the same code but in a standard python 2.7 interpreter there will be some 
differences. These differences will be due to the fact that the modified interpreter uses 
different set of opcodes. Rest should be same. So using this we should be able to find out 
which opcodes were mapped and to what new values. Generating the file set So for now 
we need to generate two sets of pyc files. One compiled from dropbox python and the 
other from standard python 2.7. We need several python script files such that by 
compiling it we can generate almost all opcodes used by python (python 2.7 has 118 
opcodes). To ease our search we can use the py files provided in a standard distribution 
(there are more than a thousand files). That should hopefully generate majority of 
opcodes if not all. Infact we will see later that even after using more than a thousand files 
there are about 5 opcodes left. We could ignore them for the sole reason that if we cannot 
find the usage of those opcodes even after comparing more than a thousand files those 
opcodes are probably not used normally. An example of such opcode is 
EXTENDED_ARG. We will use the same code developed earlier in embedder.exe & 



dropextension.pyd. For generating the first set of files, we will be linking against the 
standard python dll. For the second, we need to link with the dropbox python dll along 
with the C extension. The extension will only be needed in the second case as there is no 
access to co_code. However in both cases the code of embedder need not be changed. For 
generating the first set of reference files, we will prepare a python script. The name of the 
script file will be passed as an argument to embedder as we have been doing earlier. The 
purpose of this script will be to load a py file from disk, compile it to generate a code 
object, and then marshal it back to disk using PyPy's marshalling code. We will use 
PyPy's marshalling code instead of the built-in for consistency. «< code from 
_marshal.py should be copied here as is »> basedir = os.getcwd() py_files = 
os.path.join(basedir, 'py_files') # The source py files will be located here out_files = 
os.path.join(basedir, 'org_opcodes') # The output files will go here for f in 
os.listdir(py_files): txt = open(os.path.join(py_files, f)).read() of = 
open(os.path.join(out_files, f + '.org'), 'wb') cobj = compile(txt, ", 'exec') # Compile the 
code dump(cobj, of) # Marshal using PyPy's marshalling code of.close() Running the 
above script using embedder generates the first set of reference files. 

Page I 16 For generating the second set of reference files we will use the modified code of 
_marshal.py as used in dropdump.py. The rest of the code will be as follows, basedir = 
os.getcwd() py_files = os.path.join(basedir, 'py_files') # The source py files will be 
located here out_files = os.path.join(basedir, 'drop_opcodes') # The output files will go 
here for f in os.listdir(py_files): txt = open(os.path.join(py_files, f)).read() of = 
open(os.path.join(out_files, f + '.drop'), 'wb') cobj = compile(txt, ", 'exec') # Compile the 
code dump(cobj, of) # Marshal using PyPy's marshalling code of.close() So the above 
two code snippets are similar with the difference being the output directory & the 
extension of the output file. Running the above using dropbox python dll and embedder 
yields the second set of reference files. Finding the opcode mapping We have the two set 
of reference files generated from the same source. The difference between these two sets 
of files should reveal the opcode mapping. We need to code a tool which will find the 
differences. For simplicity we will be coding the tool in python although we could use 
any other language here, import os, marshal opcodes = dict() # Dictionary to store the 
opcodes def compare(ocode, ncode): orgCodeStr, dropCodeStr = 
bytearray(ocode.co_code), by tearray (ncode .co_code) # Make sure we are comparing 
strings of same length if len(orgCodeStr) == len(dropCodeStr): # Compare the code 
strings bytes for o, n in zip(orgCodeStr, dropCodeStr): if o != n: if o not in 
opcodes. key s(): opcodes[o] = n else: if opcodes[o] != n: print 'Two remapped opcodes for 
a single opcodes, The files are out of sync, skipping' break else: print 'Code Strings not of 
same length, skipping...' # Recursive scanning for more code objects for oconst, nconst in 
zip(ocode.co_consts, ncode .co_consts): if hasattr(oconst, 'co_code') and hasattr(nconst, 
'co_code'): # both should have co_code compare(oconst, nconst) def main(): org_files = 
os.path.join(os.getcwd(), 'org_opcodes') new_files = os.path.join(os.getcwd(), 
'drop_opcodes') 

Page I 17 for f in os.listdir(org_files): # Open the files of = open(os.path.join(org_files, f), 
'rb') nf = open(os.path.join(new_files, f[0:-4] + '.drop'), 'rb') # unmarshal & compare 
opcodes compare(marshal.load(of), marshal .load(nf)) of.closeQ nf.closeQ print opcodes 



if name == ' main ': main() Running the script gives us the following opcode 

map. {1: 15, 2: 59, 3: 60, 4: 13, 5: 49, 10: 48, 11: 54, 12: 38, 13: 25, 15: 34, 19: 28, 20: 
36, 21: 12, 22: 41, 23: 52, 24: 55, 25: 4, 26: 43, 27: 5, 28: 32, 29:30, 30: 16, 31: 17, 32: 
18, 33: 19, 40: 61, 41: 62, 42: 63, 43: 64, 50: 44, 51: 45, 52: 46, 53: 47, 54: 70, 55: 6, 56: 
29, 57: 8, 58: 27, 59: 3, 60: 31, 61: 69, 62: 7, 63: 22, 64: 50, 65: 21, 66: 2, 67: 57, 68: 39, 
71: 9,72: 14,73: 33,74: 35,75: 11,76: 58,77: 24,78: 23,79: 10,80: 40,81: 37,82: 51, 
83:66,84: 56,85:65,86: 26,87: 1,88:67,89:42,90: 105,91: 104,92: 103,93:91,94: 
83, 95: 94, 96: 97, 97: 115, 98: 108, 99: 114, 100: 82, 101: 89, 102: 90, 103: 117, 104: 
118, 105: 88, 106: 96, 107: 111, 108: 98, 109: 99, 110: 119, 111: 120, 112: 122, 114: 123, 
115: 121, 116: 80, 119: 106, 120: 84, 121: 116, 122: 85, 124: 102, 125: 92, 126: 81, 130: 
101, 131: 112, 132: 86, 133: 87, 134: 95, 135: 107, 136: 109, 137: 110, 140: 133, 141: 
134, 142: 135, 143: 136, 146: 141, 147: 142} If we observe carefully, we will find the 
following opcodes are missing from the map 0, 9, 70, 113, 145. The meaning of the 
opcodes are 0 -> STOP_CODE, 9 -> NOP, 70 -> PRINT_EXPR, 1 13 -> 
JUMP_ABSOLUTE, 145 -> EXTENDED_ARG. The opcode 113 can be generated by 
compiling the following snippet, def foo(): if barl: if bar2: print " By following the 
comparison method devised earlier we can see that opcode 1 13 is left unchanged. Now 
among the remaining four opcodes, STOP_CODE & NOP are never generated in 
compiled bytecode, so we can ignore them. PRINT_EXPR is generated when the 
interpreter is running in interactive mode so we can ignore it too. The EXTENDED_ARG 
opcode is generated whenever the argument passed to a function is too big to fit in a 
space of two bytes. It can be generated in cases like passing more than 65,536 parameters 
to a function. This is also a rare situation, so we can ignore it too. Now we have 
recovered the generated the opcode map. We have found which opcodes were changed 
and to what new values. We need to incorporate this opcode map while marshalling the 
code object to disk i.e. before dumping we will scan co_code and change remapped 
opcodes to original values, so that it can be disassembled & decompiled. 

Page I 18 Opcode unmapping To incorporate the newly found opcode map, we will reuse 
the code of dropdump.py. The code will be modified as follows. We can name the file as 
unmapper.py. import dropextension, marshal remap = {0: 0, 113: 113, 145: 145, 20: 9, 30: 
70,15: 1,59: 2,60:3,13:4,49:5,48: 10,54: 11,38: 12,25: 13,34: 15,28: 19,36:20, 
12: 21,41: 22,52: 23,55: 24,4: 25,43: 26,5: 27,32: 28, 16: 30, 17: 31, 18: 32, 19: 33, 
61: 40, 62: 41, 63: 42, 64: 43, 44: 50, 45: 51, 46: 52, 47: 53, 70: 54, 6: 55, 29: 56, 8: 57, 
27: 58, 3: 59, 31: 60, 69: 61, 7: 62, 22: 63, 50: 64, 21: 65, 2: 66, 57: 67, 39: 68, 9: 71, 14: 
72, 33: 73, 35: 74, 11: 75, 58: 76, 24: 77, 23: 78, 10: 79, 40: 80, 37: 81,51: 82, 66: 83, 
56: 84, 65: 85, 26: 86, 1: 87, 67: 88, 42: 89, 105: 90, 104: 91, 103: 92, 91: 93, 83: 94, 94: 
95, 97: 96, 115: 97, 108: 98, 114: 99, 82: 100, 89: 101, 90: 102, 117: 103, 118: 104, 88: 
105,96: 106, 111: 107,98: 108,99: 109, 119: 110, 120: 111, 122: 112, 123: 114, 121: 
115,80: 116, 106: 119,84: 120, 116: 121,85: 122, 102: 124,92: 125,81: 126, 101: 130, 
112: 131, 86: 132, 87: 133, 95: 134, 107: 135, 109: 136, 110: 137, 133: 140, 134: 141, 
135: 142, 136: 143, 141: 146, 142: 147} def dump_code(self, x): 
self._write(TYPE_CODE) self.w_long(x.co_argcount) self.w_long(x.co_nlocals) 
self.w_long(x.co_stacksize) self.w_long(x.co_flags) code = 

bytearray(dropextension.getCode(x)) c = 0 while c < len(code): n = remap[code[c]] # 
Using the opcode map code[c] = n c+=l if n < 90 else 3 # Opcodes greater than 89 takes 



2 byte parameter self.dump(str(code)) self.dump(x.co_consts) self.dump(x.co_names) 
self.dump(x.co_varnames) self.dump(x.co_freevars) self.dump(x.co_cellvars) 
self .dump(x .co_filename) self .dump(x .co_name) self .w_long(x .co_firstlineno) 
self.dump(x.co_lnotab) inf = open('authenticate.pyc', 'rb') # Load the file we wish to 
decompile inf.seek(8) # Skip 8 byte header code = marshal .load(inf) # Unmarshal using 
built in module inf.close() outf = open('decrypted.pyc', 'wb') 

Outf.write('x03xf3x0dx0ax00x00x00x00') dump(code, outf) outf.close() Note that each 
key value pair in the remap dictionary is reversed. This is due to the fact that now we 
want to change the new opcode back to the original one. Also note that we have chosen 
arbitrary values for opcodes 0, 9, 70, 145. We also need to make sure that these chosen 
arbitrary values do not clash with an existing value. We need to run the script in a similar 
way using embedder. After that it should hopefully be curtains down. 

Page I 19 The final results Running we get the following results. No errors. No messages 
A file decrypted.pyc is also created. Now time to open in pychrysanthemum, before 
decompiling. This time there are no STOP_CODE or partially disassembled opcodes. The 
coast looks clear. We can safely proceed to decompiling. Let's feed the file to Easy 
Python Decompiler. Decompiling completed withouterrors. Now time to checkout the 
codeand bask in the light of glory and success. 

Page I 20 def finish_dropbox_boot(self, ret, freshly Jinked, wiz_ret, dropbox_f older): 
self.dropbox_app.is_freshly_linked = freshly Jinked if 
self .dropbox_app .mbox .is_secondary : try : 

self .dropbox_app .mbox .complete_link( self .dropbox_app .config .get( 'email' ) , 
ret.get('userdisplayname'), ret.get('uid'), self .dropbox_app .mbox .dual_link) except 
AttributeError: self.dropbox_app.mbox.callbacks.other_client_exiting() if freshly Jinked: 
clobber_symlink_at(self.dropbox_app.sync_engine.fs, dropbox_folder) 
self.dropbox_app.safe_makedirs(dropbox_folder, 448, False) TRACE('Freshly linked!') 
try: if arch. constants .platform == 'win': TRACE('Trying to create a shortcut on the 
Desktop.') _folder_name = self.dropbox_app.get_dropbox_folder_name() 
arch .util .add_shortcut_to_desktop(_folder_name , self .dropbox_app .mbox .is_secondary ) 
if self .dropbox_app .mbox .is_primary and _folder_name != 

arch.constants.default_dropbox_folder_name: TRACE('Attempting to remove shortcut 
named Dropbox since our folder name is %r', _folder_name) 

arch.util.remove_shortcut_from_desktop(arch.constants.default_dropbox_folder_name) 
except Exception: unhandled_exc_handler() The code above is a snippet of the 
decompiled code. Now we have access to the full source code of dropbox. We can reverse 
engineer it, look for vulnerabilities or may even code an open source dropbox client. The 
possibilities are many. With this we come to the end of this tutorial. Hope you liked it and 
thanks for reading. Some additional information will be presented in the addendum. This 
is extremecoders signing off, Ciao! 

Page I 21 Addendum This supplement is provided to discuss some other features of the 
protection. We will see what steps can be taken to further increase the protection. 
Exploring the differences The defacto tool for binary comparison is BinDiff from 
Zynamics (now owned by Google), but due to its price tag we will be using freely 
available tools. We will see what other changes have dropbox incorporated to the 



standard python interpreter. Patchdiff2 is a great free alternative. It requires IDA to run. 
The flow graph on the left is from the standard python and the one on the right is from the 
dropbox python dll. Here we are comparing PyRun_FileExFlags. The function is used in 
CPython to execute a script associated with a file. In case of dropbox it has been patched 
to do no nothing. We got around this limitation by using PyRun_SimpleString which was 
unpatched. However we had to read the contents of the file ourselves and pass the source 
code as a string to the function. Similarly other functions such as 
PyRun_SimpleFileExFlags, PyRun_AnyFileExFlags have also been patched. 

Page I 22 In case of PyRun_AnyFileExFlags we see that a block on the right is missing. 
Dropbox has removed the call to PyRun_InteractiveLoopFlags. This function is used to 
read and execute statements from a file associated with an interactive device until EOF is 
reached. This is the function executed when we run python in a console window or 
terminal. The next image shows that PyRun_InteractiveOneFlags has also been patched 
to do nothing. By patching it, we cannot start an interactive python console window 
(unless we take other drastic measures like implementing our own function). 

Page I 23 Refining the protection Here we will discuss what further could have been done 
to increase the protection. Firstly, dropbox uses py2exe on windows. The source code of 
py2exe is available. The license of py2exe allows modification. Dropbox Inc. could have 
modified the source to develop a custom version, albeit with some more protections like 
debugger checks, encryption etc. However the disadvantage is the protection needs to be 
rewritten for other platforms like Linux, MacOSX etc. as py2exe is only for windows. 
Coming into the CPython part, we used PyRun_SimpleString to inject our code. This 
could have been patched too as it is not needed. This is like a hole in the armour. We used 
the built-in marshal module to load the encrypted code objects. The code objects were 
decrypted, immediately after loading. This could be avoided. By modifying 
PyEval_EvalFrameEx, we can make it such that PyCodeObject would be decrypted only 
when it was needed to execute. We can add a new co_flag to indicate which code objects 
were already decrypted in a previous run. By checking the flag we will only decrypt 
code_objects which do not have the flag set. Another good refinement to the encryption 
logic would be to decrypt on execution and re-encrypt it back after execution. By using 
the second logic we will never reach a state in which all code objects are decrypted, 
preventing a memory dump. For generating the file set needed for opcode remapping, we 
used the built-in function compile, to compile source code to bytecode. This was also not 
needed as dropbox already uses compiled pyc files in the zip file. Further, by patching 
compile, would result in an exception if we tried to inject some code as python internally 
always compiles plain text source code to bytecode before execution. Lastly there could 
have been improvements to opcode mapping. We could have used multiple opcode maps. 
That is we create a new co_flag. Suppose the flag is named OPCODE_MAP_FLAG. If 
the flag was set to 1 opcode 66 may mean INPLACE_DIVIDE, if the flag was set to 2, 
the same opcode may mean POP_BLOCK. The flag would be checked in 
PyEval_EvalFrameEx before execution to know which set of opcodes this currently 
executing code object uses. This combined with the encryption protection will definitely 
make dropbox much tougher to crack. The only drawback of increasing the protection is 
that it may result in degradation in run time performance but optimization is always 



possible. Links & References fx] Dropbox :: https ://www.dropbox .com / [yj Python :: 
https://www.python.org/ [x] Py2exe :: http ://www.py 2exe .org/ [x] Pe Explorer :: http:// 
www.heaventools .com/o verview.htm [x] Exeinfo PE :: http://exeinfo.atwebpages.com/ 
[x] 010 Editor :: http://www.sweetscape.com/010editor/ [x] PyPy :: http://pypy.org / 0 
Oily dbg :: http://www.ollydbg.de/version2 .html 0 Easy Python Decompiler :: http:// 
sourceforge.net/proiects/easypythondecompiler/ [xj Py2Exe Dumper :: http:// 
sourceforge.net/projects/py2exedumper/ 0 Pychrysanthemum :: https ://github .com/ 
monkeycz/pychrysanthemum [x] Patchdiff2 :: http s : //code . goo gle .com/p/patchdiff 2/ fx] 
Security analysis of dropbox :: https://www.usenix.org/conference/wootl3/workshop- 
pro gram/pre sentation/kholia fx] Dropbox reversing tutorial : : http://progdupeu.pl/ 
tutoriels/280/dropbox-a-des-fuites/ 



